The Smarter Crowd: Active Learning, Knowledge Corroboration, and Collective IQs

نویسندگان

  • Thore Graepel
  • David Stern
  • Jurgen Van Gael
چکیده

Crowdsourcing mechanisms such as Amazon Mechanical Turk (AMT) or the ESP game are now routinely being used for labelling data for machine learning and other computational intelligence applications. I will discuss three important aspects of crowdsourcing which can help us tap into this powerful new resource in a more efficient way. When obtaining training data from a crowdsourcing system for the purpose of machine learning we can either collect all the training data in one batch or proceed sequentially and decide which labels to obtain based on the model learnt from the data labelled so far, a method often referred to as active learning. I will discuss which criteria can be used for selecting new examples to be labelled and demonstrate how this approach has been used in the FUSE/MSRC news recommender system projectemporia.com to categorise news stories in a cost-efficient way. Data obtained from crowdsourcing systems is typically plentiful and cheap, but noisy. The redundancy in the data can be used to improve the quality of the inferred labels based on models that take into account the reliability and expertise of the workers as well as the nature and difficulty of the tasks. I will present an algorithm for such a corroboration process based on graphical models, and show its application on the example of verifying the truth values of facts in the entity-relationship knowledge base Yago. Finally, I will talk about some very recent results on the effects of parameters of crowdsourcing marketplaces (such as price and required track record for participation) on the quality of results. This work is based on methods from psychometrics, effectively measuring the IQ of the Mechanical Turk when viewed as a form of collective intelligence. This is joint work with Ralf Herbrich, Ulrich Paquet, David Stern, Jurgen Van Gael, Gjergji Kasneci, and Michal Kosinksi.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی کنش‌های شناختی دانش‌آموزان دارای لکنت

Objective Stuttering is one of the most common speech disorders that generate many complications in children and adults. This disorder involves behavioral, cognitive and emotional interactions. So, the purpose of the current study is to investigate the cognitive functions of students with stuttering. Materials & Methods A descriptive study, comprising of 30 students (8 females and 22 males) fr...

متن کامل

Digital Art and Crowd Creation in Iran (Case Study: Tehran Annual Digital Art Exhibition)

This paper aims to show the status of digital art in Iran and explain how the meaning of an artist has transformed in the digital age. The primary assumption of this paper is that the experience of digital art has again revived the collective experience in creating arts. Although, interactivity is considered to be the most important quality of digital art, their collective, collaborative and pr...

متن کامل

Wisdom of the crowd from unsupervised dimension reduction

Wisdom of the crowd, the collective intelligence derived from responses of multiple human or machine individuals to the same questions, can be more accurate than each individual, and improve social decision-making and prediction accuracy ([1, 2, 3, 4, 5]). This can also integrate multiple programs or datasets, each as an individual, for the same predictive questions. Crowd wisdom estimates each...

متن کامل

Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning

Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are more accurate than computers, such as image tagging, entity resolution, and sentiment analysis. However, due to the time and cost of human labor, solutions that rely solely on crowd-sourcing are oŸen limited to small datasets (i.e., a few thousand items). is paper proposes algorithms for integrat...

متن کامل

Active Learning for Crowd-Sourced Databases

Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are more accurate than computers, such as image tagging, entity resolution, or sentiment analysis. However, due to the time and cost of human labor, solutions that solely rely on crowd-sourcing are often limited to small datasets (i.e., a few thousand items). This paper proposes algorithms for integr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011